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Abstract. In this paper, we introduce an optimum approach for querying 
similar images on large digital-image databases. Our work is based on 
RBIR (region-based image retrieval) method which uses multiple regions 
as the key to retrieval images. This method significantly improves the 
accuracy of queries. However, this also increases the cost of computing. To 
reduce this expensive computational cost, we implement binary signature 
encoder which maps an image to its identification in binary. In order to 
fasten the lookup, binary signatures of images are classified by the help of 
S-fcGraph. Finally, our work is evaluated on COREL’S images. 


1. Introduction 


There are three common ways to approach to image retrieval [1], including: 
text-based image retrieval (TBIR), content-based image retrieval (CBIR) and 
semantic-based image retrieval (SHIR). The text-based image retrieval is diffi¬ 
cult and time-consuming to describe image’s content. Thus, it is necessary to 
build a retrieval system through content of image to find out similarity images. 
Furthermore, when querying an image through a key word or an index, the 
features of images can not describe visually. So we need to create a method 
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of extracting image’s features to find out images with similarity content. Ex¬ 
tracting visual features of image is an important task of image retrieval process 
based on content. However, if we retrieve and compare directly the content 
of image, then the problem is complicated, time-consuming and costly storage 
space. For this reason, when comparing the image’s content, we should notice 
in the query speed and storage space. 

A number of works related to the query image’s content have been published 
recently, such as Extracting image objects based on the change of histogram 
value [1], Similarity image retrieval based on the comparison of characteristic 
regions and the similarity relationship of feature regions on images [2], Color 
image retrieval based on the detection of local feature regions by Harris-Laplace 
[3], Color image retrieval based on bit plane and L*a*b* color space [4], Con¬ 
verting color space and building hash table in order to query the content of color 
images [5], the similarity of the images based on the combination of the im¬ 
age’s colors and texture [9], using the EMD distance in image retrieval [10], the 
image indexing and retrieval technique VBA (Variable-Bin Allocation) basing 
on signature bit strings and S-tree [11], etc. 

However, if the method of comparing the similarity of the content is inef¬ 
fective, the results of querying are the images with content not related to the 
requested query. The approach of the paper is to create the binary signature 
of an image. The content of the paper aims to query efficiently ” similarity 
images” in a large image database system. 

The paper approaches the semantic description of image’s content through 
a binary signature and builds a data structure to store binary signatures. This 
data structure presents the relationship among the binary signatures as well 
as image’s contents. Basing on the description of the semantic relationship 
of this data structure, the paper finds out the similarity image in content on 
COREL’S image database [6]. The paper contributes two main sections that 
reduce the amount of query storage and speed up image query on the large 
image database. 

The problem: Given an image database 9. With each image J G 9, 
extract the feature region vector = {Rf, Rf, ■■■, j) to describe the vi¬ 
sual feature of image. Each feature region Rf is described as a binary signa¬ 
ture Sig{Rf). Each query image I is extracted vector of feature region R^ = 
(i?{, i? 2 ) ^n/) which is described as binary signature Sig{R^) = [J Sig{Rl). 

Let = 4){R^,R^) = g{Sig{R^),Sig{R^)) he a similarity function be¬ 

tween image I and J. For this reason, with each query image I we need to 
determine a set of image Q C 9 which has the order relation on the base of 
similarity measure (j). 

To solve the problem, we build a measure which is used to assess the sim¬ 
ilarity between two images and it is called similarity measure. Basing on this 
similarity measure, one order set of similarity images which corresponds to 
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query image is selected. At the same time, basing on this order relation, the 
graph data structure S-fcGraph is built to describe the similarity relationship 
in the contents of images. On the base of the data structure, the paper pro¬ 
poses an algorithm which creates S-fcGraph and a similarity image retrieval 
algorithm on S-fcGraph. In order to illustrate the basic theory, the paper gives 
experiment on a set of COREL images. 

The contribution of the paper is an approach to the semantic description of 
image’s content through binary signature as well as building a data structure to 
store this binary signature. The data structure shows a relationship in the bi¬ 
nary signatures which describes the relationship among the contents of images. 
Basing on the description of semantic relationship of this data structure, the 
paper finds out similarity images which are conformable to content on COREL 
image database. [6] 

The paper is organized as follows: Section 1, Introduction. Section 2, 
Presenting construction of theory basis of image’s binary signature, the similar¬ 
ity measure between images. Section 3, presenting data structure and image 
retrieval algorithm based on S-fcGraph. Section 4, describing the applica¬ 
tion and assessing the experimental results of the process of finding similarity 
images. A conclusion and discussion of future works are given in Section 5. 


2. The similarity measure 


According to [7], the binary signature is formed by hashing the data objects, 
and it has fc bits I and (m — fc) bits 0 in the bit chain [I..m], where m is 
the length of the binary signature. Data objects and object of the query are 
encoded on the same algorithm. When the bits in the data object signature are 
completely covered with the bits in the query signature, then this data object 
is a candidate of the query. There are three cases: (I) the data object matches 
the query: each bit in the Sg is covered with the bit in the signature of the 
data object (i.e., Sg A Si = Sg); (2) the object does not match the query (i.e., 
SgAsi yf Sg); (3) the signatures are compared and then give a, false drop result. 

In order to evaluate the similarity between two images, firstly the paper 
builds the binary signature to describe the visual features of each image. On 
the base of this binary signature, the paper builds similarity measure between 
two images. The binary signature Sig{I) of the image I is defined as follows: 

Definition 2.1. Let F = {Fi,Fnp) be a vector to describe the feature 
values of region Rj of image. Let F{Rl) = {Fi{Rj),Fnp{R{)) be a vector 
value of region feature attribute which is standardized on [0,1] (i.e: Fj{R() G 
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[0,1], ^ = 1, j = l,...,nF)- We set Bj = b{bi...bi^ with b\ = 1 if 

3 

k = [Fj{Rl) X to], otherwise = 0, /c = 1 ,...,to. At that time, the binary 
signature of feature region Rj is defined as Sig^Rf) = . The binary 

signature of image I is Sig{I) = Sig{R^) = [J Sig{Rj). 

i 

In order to increase the accuracy of image query corresponding to the match¬ 
ing feature regions, we need to match the positions of feature regions between 
the images. For this reason, we need to determine the center positions of fea¬ 
ture regions to match the similarity between the images. The center positions 
of feature regions is defined as follows: 

Definition 2.2. Let R^ = {R{, R 2 , ■■■, R^j) be a vector of feature region 
of image I{x,y). Then, each feature region Rj G R^ with center as C{Rj) = 

{xo,yo) = {{xs - Xe)/ 2 , {ys - ye)/2), where dsUxs, Vs), {xe,ye)) = r^^^Q^{dE{{xa^, 
yai), {xaj,yaj))\{xa*,ya*) G Boundary{Rj)}, withds as an Euclidean distance 
and Boundary{Rl) as a boundary of feature region Rj. 

On the base of binary signature and center of feature regions, we set 
and Rj in turn as the feature regions on the image / and J, respectively. At 
that moment, the distance between two feature regions is defined as follows: 

Definition 2.3. Let R^ = {R{, R 2 ,..., Rnj) and R'^ = {R(, Rf ,..., R'nj) be 
two vectors of feature regions of two images I{x,y) and J{x,y). The distance 
between feature regions Rj G R^ and R{ G R"^ is S(Rj,R{) = \\Siq(Rf) — 
Sig{RJ)\\i+dEiC{Ri),C{RJ)). 

In order to evaluate the correlation between the measures of images, the 
following theorem shows that the distance , Rj) as a metric. 

Theorem 2.1. If R^ = {Rl, R 2 , ■■■, R^j) and R'^ = {Rf, Rf, ■■■, R'^j) are 
two vectors of feature regions of two images I{x,y) and J{x,y) then the distance 
6{R{,Rj) is a metric. 

Proof. (1) Suppose that and Rj are two feature regions of R^ and 
R-^. Then, \Sig{Ri) - Sig{Rj)\\i > 0 and dE{C{Rl),C{Rj)) > 0. Thus, 
S{R{,Rj) = \\Sig{Rl) - Sig{Rj)\\i + dE{C{Rl),C{Rj)) > 0. Assume that 
5{RlRj) = \\Sig{Rl)-Sig{Rj)\\ 2 +dE{C{Rl),C{Rj))=Q, then \Sig{Rl)- 
Sig{Rj)\\i = 0 and dE{C{Rl),C{Rj)) = 0. Furthermore, ||.||i and (i_E(.,.) are 
the metrics. So, Sig{R{) = Sig{Rj) and C{R{) = C{Rj). Infer, 6{R(,Rj) > 0 
and 6{R(, Rj) = 0 = Rj. 

(2) Let fc be a real number, then: 

njr 

5{kRlkRj) = \\Sig{kRi)-Sig{kRj)\\^EdE{C{kRi),C{kRj))=Y. \kbl - kb{\ + 
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y {kxl - kx^f + {kyl - ky^f = |fc| \bl - hi\ + ^k^{xl - x^f + - y^f 

= \k\ E \H - bt\ + \k\\/ixi - xif + iyl - yi^f 
= \k\ (^^E \H -bi\ + \l{xi-xif + (yl-ylf^ 

= |fc| xl]\Sig{Rl) - Sig{RJ^)\\,+dE{C{Rl),C{RJ^))) = |k| x 

(3) Let R^ = {Ri ,R 2 , ■■■,Rnj^) be a vector of feature regions of image K, 

then: , i?/) + i?f) = (||5tg(i?f) - 5z5(i?/)||i + di 5 (C(i?f), C(i?/))) 

+ (WSigiRf) - SzgiR^)\U + dE{C{Rj),C{R^))) 

= {]\Sig{Rl) - Sig{Rf)\\, + \\Sig{Rf) - Sig{Rj^)\U) 

+ {dEiCiRi),CiRJ)) + dE{C{RJ),CiRj^))) 

> WSigiRi) - Stg{R^)\\,+dEiCiRi),C{R^)) = 5{RlRf). 

From (1), (2), (3) infer S{R(,Rj) is a metric. ■ 

On the base of the similarity between the images, the paper builds the 
similarity measure between two images. On the base of binary signature and 
feature regions of image, the similarity measure between two images is defined 
as follows: 

Definition 2.4. Let R^ = {R{, R 2 , ■■■, R^^) and R"^ = {R{,R 2 , ■■■,Rnj) be 
two veetors of feature regions of two images I{x,y) and J(x,y). The similarity 
function between two images I and J is defined as (j>{I,J) = (f>{R^,R'^) = 

||5*5(/) - 5*5(J)|| + dE{C{I),C{J)), C{I) = {Ihi) E C{Rl). 

i 

Lemma 2.1. The similarity function 4>{I,J) between two images I and J 
is a metric. 

Proof, similar to Theorem 2.1 ■ 

The process of similarity image retrieval is to find a set of images that has 
the similar content to query image. On the base of the similarity measure at 
Definition 2.4, with each query image I, a set of similarity image Q is defined 
as follows: 

Definition 2.5 {Similarity Image Retrieval). Let 3?/ = G 9) A 

{(j){I,Ji ) < J/ > Jj) A {i ^ j) A {i,j = l,...,n)} be an order set 

including the images based on the measure </>. A set of similarity images Q C 9 
includes k similarity images is mean Q = {Ji G Qj(/>(I,Ji) = (j){R^,R'^) < 
9{R^, R'^),y.J G 9, i = 1,..., k}, with k = \Q\ and 9{R^, R^) is the threshold of 
hR^,h). 


After querying similarity images based on the similarity measure (j), we 
need to rank the query result according to the similarity measure with the 
query image. Therefore, a set of result including similarity images Q must be 
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ranked on the similarity measure <j). Following theorem shows a set of result 
images Q is an order set. 

Theorem 2.2. If I is the query image, then the set of similarity images 
Q C is an order set on the relation 

Proof. (1) Symmetry: If I is the query image and J € Q is an any image, 
then i.e satisfy condition (j){I,J) < (j){I,J). Hence, J >- J, 

i.e Q has the symmetry on 

(2) Antisymmetry: Let Ji,Jj € Q and i j. Suppose that Ji >- Jj, i.e 
Ji) < 4>{I, Jj)- Addition Ji Jj so Ji) < (j){I, Jj). Moreover, accord¬ 
ing to Lema 2.1, </> is a metric. Correspondingly, we have not 4>{I, Jj) < Ji). 
So, if Ji Jj, then not Jj >- Ji, i.e Q has an antisymmetry on 

(3) Transitivity: Let Ji, J 2 , J 3 ,€ Q be three images corresponding to im¬ 
age query I, suppose that Ji >- J 2 and J 2 J 3 . i.e (j){I,Ji) < (j){I,J 2 ) 
and (/)(/, J 2 ) < 4>{I,Jz). Otherwise, pursuant to Lema 2.1, f is a, metric, so 

Hi,Ji)<Hi,Js)- 

Infer: If Ji >~ J 2 and J 2 >- J 3 then Ji J 3 , i.e Q has transitivity on 

From (I), (2), (3) we infer the set of similarity images Q C A is an order set 

on the relation ■ 


3. The data structure and image retrieval algorithm 


3.1. The S-fcGraph 


After creating binary signature and similarity measure between the images, 
the problem is how to query quickly and reduce the query storage. So, we 
have to build a data structure to store the binary signatures. We also describe 
the relationship between the images simultaneously. The paper builds the 
graph structure to describe the similarity relationship based on the binary 
signature {Definition 2.1) and the similarity measure {Definition 2.4). This 
graph structure is called signature graph (SG) with each vertex in the graph 
including the pair of identification oidj and signature sigi corresponding to 
image I. The weight between two vertexes is the similarity measure (f. The 
data structure SG is defined as follows: 

Definition 3.1 {Signature Graph). The signature graph SG = {V,E) is the 
graph which describes the relationship between the images, where is the set of 
vertexes V = {{oidj, Sig{R^))\I € A} and the set of edges E = {(/, J)\(j>{I, J) = 
4){R^,R'') < 6 {R^, R'^),yi, J € A}, where 0{R^, R'') is a threshold value and'^ 
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is an image database. The weight of each edge {I, J) is a measurement function 
of the similarity J) = 4>{R^, R'’)> 

Each vertex v & V in SG determines k elements which has the nearest 
similar measurement. However, with the number of images in a large database, 
it is difficult to determine the set of similarity image corresponding to the query 
image. Therefore, we build the notion of S-A:Graph so that each vertex includes 
the nearest image and called k-neighboring image. 

With each k-neighboring image, the paper builds a cluster including simi¬ 
larity images. This cluster represents an item called center cluster. Then, each 
cluster includes similarity images is defined as follows: 

Definition 3.2. A cluster Vi has center li, with kid as a radius, is defined 
as follows: Vi = Vi{Ii) = {J\(j){Ii, J) < kid, J € 9, i = 1, ki € N*. 

On the base of clusters, the paper defines the data structure S-fcGraph 
including vertexes as clusters and the weight between two vertexes as the sim¬ 
ilarity measure (j). The data structure S-A:Graph is defined as follows: 

Definition 3.3 {S-kGraph). Let fl = {Vi\i = l,...,n} be a set of clusters 
so that Vi D Vj = 0,i yf j. The S-kGraph = {VsGi Esg) is the graph with 
the weight, including a vertex set Vsg o,nd an edge set Esg which are defined 
as follows: Vsg = O = {Vi\3\Iig G 14,VI G Vi,4){Ii„,I) < hgd,i = 

Esg = {{y^,y 3 )\i 7^ G G ysG,d{yi,yj) = J,o)}, where 

d{Vi,Vj) is the weight between two clusters and^I G Vi,4>{Ii^,I) < ki^d. 

With each image we need to classify in clusters through the data structure 
S-fcGraph. So, we need to have the rules of distribution in clusters of the 
S-fcGraph. These rules are defined as follows: 

Definition 3.4 [The Rules of Distribution of Image). Let LI = {i4|f = 
l,...,n} be a set of clusters so that ViDVj = %,i j, Iq be an image which 
needs to distribute in a set of clusters LI, Im be a center of cluster Vm so that 
{(j){Io,Im) — kmd) = min{((/)(/o,/i) — kid),i = l,...,n}, where li is a center of 
cluster Vi- There are three cases as follows: 

(1) If 4>{Ii3,Im) < kmd then the image Iq is distributed in cluster Vm- 

(2) If 4>{Ii3,Im) > kmd then setting ko = [{4>{Io,Im) — kmd)/d], at that time: 

(2.1) If ko > 0 then creating cluster Vq with center Iq and radius kod, at that 
time O = O U {Vq}. 

(2.2) Otherwise (i.e ko = Q), the image /q is distributed in cluster Vm and 
4^(.do^ dm) — kmd. 

Each image needs to exist a cluster in the S-fcGraph so that images are 
classified. Moreover, to avoid the invalid data in clusters, the images are dis¬ 
tributed in unique cluster. The theorem 3.1 and theorem 3.2 show the unique 
distribution. 
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Theorem 3.1. Given the S-kGraph = {Vsg, Esa)- Let {Vi,Vj) € Esa and 
ho ,Jj„ in turn be a eenter ofVi,Vj. At that time, d{Vi,Vj) = > 

{kig + kjg)9, with VI e Vi, <j){Iig,I) < kigO and VJ € Vj,(i){Jjg,J) < kjgO. 

Proof. So VI e Vi,(j){Iig,I) < kigO and VJ € Vj,<j){Jjg, J) < kjgO. That V/' € 
Boundary(yi),yj' G BoundaryiVj) then 4>{Iig ,E) = KO and 4>{Jjg,J') = 
kjgO. Moreover, because Vsg = is a set of unconnected cluster, so ViGVj = ib 
that </<(/', J') > 0. 

Infer: VI' G Boundary (Vi),WJ' G Boundary{Vj) then (j){Iig,I') + + 

(j){Jjg,J') > {kig + kjg)9. Otherwise, because ^ is a metric, so <p{Iig,I') + 
(j){I',J')+4>{Jjg,J') > 4>{Iig,Jjg). Audd/p G Boundovy{Vi) ,3 J Q G Boundary{Vj) 
so as HligJo) + (bil'o, Jo) + = <l>iho,Jjo)- 

Therefore, (j){Iig,Jjg) > {hg + kjg)9. ■ 

Theorem 3.2. If each image I is distributed in a set of clusters = {Vi\i = 
l,...,n}, then it belongs to an unique cluster. 

Proof. Let I be an any image, suppose that 3Vi, Vj as two clusters, so Vi Vj 
and {I G Vi) A {I G Vj). Setting Ii,Ij in turn as two centers cluster Vi,Vj we 
have (j>{Ii,I) < ki9 and (j}{Ij,I) < kj9. Thus, (j){Ii,I) + (j>{Ij,I) < {ki + kj)9. 
Furthermore, because (p is a metric, we have (j){Ii,I) + 4>{Ij,I) > (j){Ii,Ij). 
Otherwise, Ii,Ij in turn as two centers cluster Vi,Vj so that <j){Ii,Ij) > {ki + 
kj)9. Hence, 4>{Ii,I) + (j){Ij,I) > (p{Ii,Ij) > {k,_ + kj)9 and (p{Ii, I) + ^{Ij, I) < 

{ki + kj)9. 

For this reason, the supposition is illogical. Le each image / is only distributed 
in an unique cluster. ■ 

In order to avoid invaliding data, the rules of distribution {Definition 3.4) 
needs to ensure that the image is classified in an unique cluster. Theorem 3.3, 
theorem 3.4 and theorem 3.5 show this problem. 

Theorem 3.3. If the value 4>{I,Im) — kmd < 0 then it only occurs at one 
unique Im- 

Proof. Suppose that 3Jo is a center of cluster Co € H so that (j){I, Io) — ko9 < 0 
</)(/,/o) < ko9, i.e I belongs to cluster Cq. Otherwise, according to the 
supposition, (f){I,Im) — km9 < 0, i.e I belongs to cluster Cm Cq. It means 
that I belongs to two different clusters and pursues to Theorem 3.2, each image 
I only belongs to an unique cluster. Thus, the supposition is illogical. Inferring, 
if the value is ^(/, Im) — km9 < 0, it only occurs at one unique Im- B 

Theorem 3.4. If Q, = {Vi\i = l,...,n} be a set of clusters and I is an 
image then it exists cluster Vig G so that I G Vig. 

Proof. According to Definition 3.4, any image I also exists a cluster Vig G Tl 
so that I G Vig. ■ 
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Theorem 3.5. Each image I is distributed in an unique cluster Ci^ € VL. 

Proof. According to Definition 3-4, any image I also exists a cluster Vi^ € 
so that I G Vig. According to Theorem 3.2, any image I is only distributed in 
an unique cluster. Inferring, any image I is distributed in an unique cluster 

Gn. m 

3.2. Extracting the feature regions 


In order to execute the similarity image retrieval process according to the 
proposed theory, we firstly extract the feature regions of the image. The paper 
presents the method to extract the feature regions based on the interest points 
on image. This interest points are extracted with the intensity and Harris- 
Laplace detector. 


In order to extract the visual features of image, the first step is standard¬ 
ized the image size. Let Y, Cb, Cr be Intensity, Blue color. Red color, respec¬ 
tively. According to [3], [4], the Gaussian transformation by human’s visual 
system is fulfilled as follows: L{x,y) = ^[Q.G{x,y,5D) * Y -I- 2.G{x,y,SD) * 

Gb + 2.G{x,y,6D) * Gr] with G{x,y,5D) = .exp( ’^^~^| ). The inten¬ 

sity Io{x,y) for color image is calculated according to equation: Io{x,y) = 
Det{M{x,y)) — a.Tr‘^{M{x,y,)), where Det{»),Tr{») are Determinant and 
Trace of matrix, respectively. M{x,y) is a second moment matrix M{x,y) = 


Si 


■G{Sl) : 


L 

L,dj' 


bi ^ b1 . 


x^y 


x^y 


where Si,Sd are the integration scale and differ¬ 


entiation scale, and La is the derivative computed the a direction. The interest 
points of color image are extracted according to formula: Io{x,y) > Iq{x', y'), 
with x',y' G A, lQ{x,y) > 9, where A is the neighboring of point {x,y) and 9 
is a threshold value. 

Let Oi = {o}, Oj,..., o^} be a set of feature circles with its center as a 
interest points and a set of feature radius Rj = {r), r|,..., rp}. Values of 
feature radius are extracted with LoG method (Laplace-of-Gaussian) and their 
value in [0, min(M, N)/2], where M, N are the height and the width of image. 
For each image, the process of extraction interest points is described as follows: 



Figure 1. A sample result of extracting feature region 
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Step 1. Convert from RGB color space to YCbCr color space. 

Step 2. Perform Gaussian transform for the human visual system to calcu¬ 
late the L{x, y). 

Step 3. Calculate the feature intensity Io{x,y) for color images. Then, 
collect the set of interest points. 

Step 4- Implement of the extraction feature regions Oj = {o}, Oj,..., o”} 
based on the interest points. 

3.3. Binary signature of the image 


After extracting the feature regions of image, we need to create the binary 
signatures to describe these. On the base of the binary signatures, we perform 
the similarity image retrieval process for the proposed theory. 

With each feature region o) G O/ of the image /, the histogram is calculated 
on the base of the standard color range C. Effective clustering method relies 
on Euclidean measure in RGB color space classify colors of every pixel on 
the image. Let p be a pixel of image I which has a color vector in RGB as 
Vp = {Rp, Gp, Bp), Vm = {Rm, Gm, Bjn) be a color vector of a set of standard 
color range G, so as Vm = min{||V^ — Vi\\,Vi € G}. At that time, pixel p is 
standardized in accordance with color vector Vm- According to experiment, 
the paper uses the standard color range on MPEG7 to calculate histogram for 
color images on GOREL database. 

Setting o] € Oi {i = a feature circle of the image I, the his¬ 

togram vector of the circle o} is iL(oj) = {Hi{o\), ...,Hn{o\)}. Setting/ifc(oj) = 

^ standard histogram vector is h{o\) = {/ii(o}),..., h„(o})}. Then, 

Tjj yOj) 

j 

the binary signature describes hk{o}) as Rj = with 6^ = 1 if j = 

[(/i,(o}) + 0.05) X m], otherwise Ipj = 0. So, the signature describes the feature 
region o] G O/ as Sig{ol) = For this reason, the binary signature 

of the image I is S'/ = lJi=i The process of creating binary signatures 

for color images is described as follows: 

Step 1. Calculate the histogram vector iL(oj) = {iLi(oj), iL 2 (o/),..., iL„(oj)} 
on the base of feature region o\ G Oi with the set of standard color C. 

Step 2. For each the feature region o\ G O/, standardize histogram vector 
as h{o\) = {hi{o\),h 2 {o\),...,hn{o\)}. 

Step 3. Create the binary signature for hk{o\) as B^ = with 

= 1 if j = [(hj(oj) -I- 0.05) X m], otherwise b^j = 0. The signature describes 
the feature region o} G O/ as Sig{o{) = B)Bj...B^. 

Step 4- Create the binary signature of image I as S/ = Uf=i^*5K). 
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3.4. Creating S-fcGraph 


On the base of the similarity measure </>, the S-fcGraph is shown in Definition 
3.3 and the rules of distribution of image are shown in Definition 3.4, the 
paper proposes the algorithm to create the data structure S-fcGraph. With 
the input image database 9 and the threshold k9, we need to return the S- 
fcGraph. Firstly, we initialize the set of vertex Vsa = 0 and initialize the set 
of edge Esg = 0, after that create the first cluster. With each image I we 
evaluate the distance (p with the center of cluster and to find out the nearest 
cluster according to ((/)(/,/™) — kmO) = niin{(^(/,/g) — ki9),i = 1, If 

the condition < km9 is satisfied, the image I is distributed in cluster 

Vm- Otherwise, we consider the rules of distribution as shown in Definition 3.4 
to classify the image I into appropriate cluster. This algorithm is as follows: 
Algorithm 1. Greate the S-fcGraph 
Input: Image database 9 and threshold k9 
Output: S-fcGraph = (Vsa, Esc) 

1 : VsG = 0; Esg = 0; fc/ = 1; n = 1; 

2: for V/ G 9 do 
3: if VsG = 0 then 

4: /g" = /; r = ki9; 

5: Initialize cluster Vn = (/g , r,(j) = 0); 

6 : VsG = VsG u V„; 

7: else 


8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 


(</)(/, ipf) - km9) = min{((/)(/, /^) - k^9), i = 1,..., n} 
if (/)(/,/™) < km9 then 

= km9,(p{I,I^)y, 

else 

ki = mi,i^)-k^9)/9]- 

if fc/ > 0 then 

Jg"+1 = /; r = fc/0; 

Initialize cluster G„+i = ^ = 0); 

VsG = VsG U 14-1-1; 

Esg = Esg^ {{Vn+uVyW'y+yiD < fc0, i = 1,..., n}; 
n = n -I- 1; 
else 

4>{I,I^) = k^9- 
Vm=VmU{I,km9,cP{I,IE^))-, 

end if 
end if 
end if 
end for 

Return S-fcGraph = {Vsg, Esg)', 
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3.5. Image retrieval algorithm 


After creating the S-fcGraph, we need to query the similarity images on 
it. With each query image Iq, we need to query the set of the similarity 
images IMG. This query process finds out the nearest cluster in S-fcGraph 
with (j)min = /o)g = 1, On the other hand, we 

need to query the similarity images at adjacent vertex with the measure less 
than threshold k9. This algorithm is described as follows: 

Algorithm 2. Image Retrieval Algorithm based on S-fcGraph 
Input: query image Iq, S-fcGraph=(Vs'( 5 ,threshold k9 
Output: set of a similarity image IMG 
1: IMG = 0; y = 0; 

2 : = (I){Iq,I^) = imn{(j){IQ, Pq) , i = 

3 : for Vi € VsG do 
4: if Jg) < k9 then 

5: V = VUV; 

6: end if 

7: end for 

8 : for Vj € V do 

9: IMG = IMGU{Il,Il G y„fc= 

10: end for 

11: Return IMG; 


4. Experiments 


4.1. Model of image retrieval system 

Phase 1 : Perform pre-processing 

Step 1. Extract feature regions of the images in database into the form of 
feature vectors. 

Step 2. Gonvert the feature vectors of the image into the form of binary 
signatures. 

Step 3. Calculate the similarity measure among the binary signatures of 
the images and insert into S-fcGraph. 

Phase 2: Implement Query 

Step 1. For each query image, we extract the feature vector and convert 
into binary signature. 
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Figure 2. The model of RBIR using S-fcGraph 
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Figure 3. A sample result of image retrieval based on S-A: Graph 

Step 2. Perform the process of binary signature retrieval on S-A;Graph to 
find out the similarity images. 

Step 3. After creating the similarity images, we carry out an arrangement 
from high to low and give a list of the images on the base of the similarity 
binary signatures. 

4.2. The experimental results 


The experimental processing on GOREL sample data [6] including 10,800 
images which are divided into 80 different subjects. With each query image, we 
retrieve images on GOREL data as so as find out the most similar ones to the 
query image. Then, we compare to the list of subjects of images to evaluate 
the accurate method. 

Binary signatures are introduced into two forms of query structure including 
SSE (sequential signature file) and S-fcGraph. Eig.6 and Fig.7 describe empir¬ 
ical figures about the similarity image retrieval process on GOREL images. 
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Figure 4. Number of comparisons to create S-fcGraph 



Figure 5. The time to create S-fcGraph 
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Figure 6. Number of comparisons to query image 
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Figure 7. The time to query image 



Figure 8. Recall 
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5. Conclusion 


The paper gives a similar evaluation method between two images on the base 
of binary signature and creates S-fcGraph to describe the relationship between 
images. As a result, the paper creates the image retrieval system model on the 
base of feature regions which is to simulate the experiment on COREL’S image 
data classification. According to experimental results, the method of evaluation 
which is based on S-fcGraph speeds up in query similarity images more than 
query in SSF (sequential signature file). However, the use of the features of 
color gives an inaccurate result in the sense of image content. Therefore, the 
next development is to extract objects on the image. Consequently, the paper 
gives binary signatures to describe objects as well as the contents of images. 
On the base of these binary signatures, we assess the similarity measure and 
return the set of similarity images with query image. 
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